Error Mining for Wide-Coverage Grammar Engineering

نویسنده

  • Gertjan van Noord
چکیده

Parsing systems which rely on hand-coded linguistic descriptions can only perform adequately in as far as these descriptions are correct and complete. The paper describes an error mining technique to discover problems in hand-coded linguistic descriptions for parsing such as grammars and lexicons. By analysing parse results for very large unannotated corpora, the technique discovers missing, incorrect or incomplete linguistic descriptions. The technique uses the frequency of n-grams of words for arbitrary values of n. It is shown how a new combination of suffix arrays and perfect hash finite automata allows an efficient implementation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Correcting Syntactic Annotation Errors Based on Tree Mining

This paper provides a new method to correct annotation errors in a treebank. The previous error correction method constructs a pseudo parallel corpus where incorrect partial parse trees are paired with correct ones, and extracts error correction rules from the parallel corpus. By applying these rules to a treebank, the method corrects errors. However, this method does not achieve wide coverage ...

متن کامل

Automated Multiword Expression Prediction For Grammar Engineering

However large a hand-crafted widecoverage grammar is, there are always going to be words and constructions that are not included in it and are going to cause parse failure. Due to their heterogeneous and flexible nature, Multiword Expressions (MWEs) provide an endless source of parse failures. As the number of such expressions in a speaker’s lexicon is equiparable to the number of single word u...

متن کامل

Engineering a Wide-Coverage Lexicalized Grammar

We discuss a number of practical issues that have arisen in the development of a wide-coverage lexicalized grammar for English. In particular, we consider the way in which the design of the grammar and of its encoding was influenced by issues relating to the size of the grammar.

متن کامل

Application of Satellite Data and Data Mining Algorithms in Estimating Coverage Percent (Case study: Nadoushan Rangelands, Ardakan Plain, Yazd, Iran)

Assessing and monitoring rangelands in arid regions are important and essential tasks in order to manage the desired regions. Nowadays, satellite images are used as an approximately economical and fast way to study the vegetation in a variety of scales. This research aims to estimate the coverage percent using the digital data given by ETM+ Landsat satellite. In late May and early Ju...

متن کامل

Robust Parsing, Error Mining, Automated Lexical Acquisition, And Evaluation

In our attempts to construct a wide coverage HPSG parser for Dutch, techniques to improve the overall robustness of the parser are required at various steps in the parsing process. Straightforward but important aspects include the treatment of unknown words, and the treatment of input for which no full parse is available. Another important means to improve the parser's performance on unexpected...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004